We are also now offering dedicated instances for users who want deeper control over the specific model version and system performance. By default, requests are run on compute infrastructure shared with other users, who pay per request. Our API runs on Azure, and with dedicated instances, developers will pay by time period for an allocation of compute infrastructure that’s reserved for serving their requests. Developers get full control over the instance’s load (higher load improves throughput but makes each request slower), the option to enable features such as longer context limits, and the ability to pin the model snapshot. Dedicated instances can make economic sense for developers running beyond ~450M tokens per day. Additionally, it enables directly optimizing a developer’s workload against hardware performance, which can dramatically reduce costs relative to shared infrastructure. For dedicated instance inquiries, contact us.
|